Bayesian Protein Structure Alignment.
نویسندگان
چکیده
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
منابع مشابه
Inferring Site-Specific Evolutionary Rates: Bayesian Methods are Superior
Not all sites in protein sequences evolve at the same rate during the course of evolution. The working hypothesis assumes that evolutionary conserved sites in a protein sequence point to functionally important regions. Thus, identifying conserved regions in a protein impacts such areas as functional annotation studies, drug design, quaternary structure prediction, and protein biochemistry [1]. ...
متن کاملFast Bayesian Shape Matching Using Geometric Algorithms
We present a Bayesian approach to comparison of geometric shapes with applications to classification of the molecular structures of proteins. Our approach involves the use of distributions defined on transformation invariant shape spaces and the specification of prior distributions on bipartite matchings. Here we emphasize the computational aspects of posterior inference arising from such model...
متن کاملBayesian Alignment of Similarity Shapes.
We develop a Bayesian model for the alignment of two point configurations under the full similarity transformations of rotation, translation and scaling. Other work in this area has concentrated on rigid body transformations, where scale information is preserved, motivated by problems involving molecular data; this is known as form analysis. We concentrate on a Bayesian formulation for statisti...
متن کاملIn Silico Analysis of Primary Sequence and Tertiary Structure of Lepidium Draba Peroxidase
Peroxidase enzymes are vastly applicable in industry and diagnosiss. Recently, we introduced a new kind of peroxidase gene from Lepidium draba (LDP). According to protein multiple sequence alignment results, LDP had 93% similarity and 88.96% identity with horseradish peroxidase C1A (HRP C1A). In the current study we employed in silico tools to determine, to which group of peroxidase enzymes LDP...
متن کاملPredicting protein secondary structure with probabilistic schemata of evolutionarily derived information
We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn spe...
متن کاملBayesian Model of Protein Primary Sequence for Secondary Structure Prediction
Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein's function in the cell. Understanding a protein's secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- The annals of applied statistics
دوره 8 4 شماره
صفحات -
تاریخ انتشار 2014